- Analyse des données importées
- Nettoyage des données
- Statistiques descriptives
- Analyse des composantes principales (ACP)
- Clustering
- Bonus: Wordcloud
Janvier 2019
Le dataset contient toutes les ventes émises par un magasin en ligne basé au Royaume-Uni. Le magasin vend des cadeaux uniques pour toutes les occasions.
1. InvoiceNo: Contient le numéro de la facture
2. StockCode: Numéro d'identification du produit (C indique une annulation)
3. Description: Description du produit
4. Quantity: Quanitités de pièces commandé
5. InvoiceDate: Date de la facture avec l'heure
6. UnitPrice: Prix de l'unité en livre sterling
7. CustomerID: ID du client (NA lors de modification de stock)
8. Country: Pays destinataire
Source: https://archive.ics.uci.edu/ml/datasets/Online+Retail
Du 1/12/10 08:26 au 9/12/11 12:50
541909 lignes dans le dataset
1. C: Annulation
2. C2: Transport
3. D: Réductions
4. POST: Frais postaux
5. M: Ajout manuel
6. BANK CHARGES: Frais bancaire
7. PADS: Frais d'emballage
8. DOT: DOTCOM POSTAGE
9. UnitPrice < 0
10. Quantity < 0
11. CustomerID = NA
541909 lignes à la base
396337 lignes après nettoyage
391150 lignes après nettoyage et suppression des duplications
Pourcentage retiré : 27.81998 %
i. Nombre de pays: 37 (Attention 37 + 'undefined')
ii. Nombre de factures (uniques): 18402
iii. Nombre de clients (uniques): 4334
iv. Nombre de produits (uniques): 3659
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.38 157.14 301.65 474.80 463.07 168469.60
NbOfProduct Turnover NbOfCustomers NbOfInvoice NbOfProduct 1.0000000 0.8167573 0.7181459 0.8766853 Turnover 0.8167573 1.0000000 0.5298723 0.7757819 NbOfCustomers 0.7181459 0.5298723 1.0000000 0.8919510 NbOfInvoice 0.8766853 0.7757819 0.8919510 1.0000000
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.7951395 0.6977192 0.37087114 0.20495248
Proportion of Variance 0.8286495 0.1251803 0.03536882 0.01080142
Cumulative Proportion 0.8286495 0.9538298 0.98919858 1.00000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
NbOfProduct 0.516 0.207 0.816 0.155
Turnover 0.470 0.672 -0.515 0.247
NbOfCustomers 0.474 -0.687 -0.221 0.504
NbOfInvoice 0.536 -0.182 -0.138 -0.813
Comp.1 Comp.2 Comp.3 Comp.4
SS loadings 1.00 1.00 1.00 1.00
Proportion Var 0.25 0.25 0.25 0.25
Cumulative Var 0.25 0.50 0.75 1.00
NULL
Quantity Purchases NbOfCustomers Avg UnitPrice
Quantity 1.00000000 0.7474641 0.61811828 -0.08826218
Purchases 0.74746408 1.0000000 0.68451970 0.04854800
NbOfCustomers 0.61811828 0.6845197 1.00000000 -0.05353721
Avg UnitPrice -0.08826218 0.0485480 -0.05353721 1.00000000
NbOfCountry 0.45721675 0.4889382 0.78839734 -0.04236737
NbOfCountry
Quantity 0.45721675
Purchases 0.48893816
NbOfCustomers 0.78839734
Avg UnitPrice -0.04236737
NbOfCountry 1.00000000
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Standard deviation 1.7030522 1.0056865 0.8236076 0.4957014 0.40347502
Proportion of Variance 0.5802359 0.2023364 0.1357030 0.0491574 0.03256732
Cumulative Proportion 0.5802359 0.7825723 0.9182753 0.9674327 1.00000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Quantity 0.487 0.535 0.685
Purchases 0.504 -0.137 0.421 -0.638 -0.377
NbOfCustomers 0.536 -0.303 -0.225 0.755
Avg UnitPrice -0.989 0.119
NbOfCountry 0.470 -0.665 0.240 -0.528
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
SS loadings 1.0 1.0 1.0 1.0 1.0
Proportion Var 0.2 0.2 0.2 0.2 0.2
Cumulative Var 0.2 0.4 0.6 0.8 1.0
NULL
1 2 3 4 24 2 2 8
Les centres des clusters
NbOfProduct Turnover NbOfCustomers NbOfInvoice 1 -0.5521902 -0.45333153 -0.3856213 -0.41757550 2 2.4050389 2.06025117 3.8117960 3.58162446 3 1.9319784 3.08121261 -0.2826544 1.23105399 4 0.5723161 0.07462864 0.2745785 0.04955689
1 2 3 4 5 10 1034 261 6 2348
Les centres des clusters
Quantity Purchases NbOfCustomers Avg UnitPrice NbOfCountry 1 12.0071871 12.20845692 4.7319949 -0.046732368 2.3980055 2 0.1480762 0.09963196 0.5012783 -0.079152150 0.8689780 3 1.7105748 1.81562237 2.5278296 -0.070607758 1.9178017 4 -0.3904298 0.34862815 -0.5670037 19.618540386 -0.6061676 5 -0.3054941 -0.29858315 -0.5204443 -0.007228267 -0.6045198